Automated Mapping of Coarse-Grain Pipelined Applications to FPGA Systems
نویسنده
چکیده
Configurable systems offer a unique opportunity to define application-specific architectures. These architectures offer performance advantages, where the use of customized pipelines exploits the inherent parallelism of the application. In this research, we describe a set of program analyses and an implementation that automatically map a sequential and un-annotated C program into a pipelined implementation targeted to an FPGA with multiple external memories. This research describes an automated approach to hardware design space exploration, through a collaboration between parallelizing compiler technology and high-level synthesis tools. In previous work, we described a compiler algorithm that optimizes individual loop nests, expressed in C, to derive an efficient FPGA implementation. In this research, we describe a global optimization strategy that maps multiple loop nests to a coarse-grain pipelined FPGA implementation. We focus on the space-time tradeoffs associated with differing amounts of parallelism, communication granularities and custom data layouts. Highly optimized designs may be too large to fit within FPGA resource constraints, so we describe heuristics for reducing area requirements while minimizing the impact on global performance. We present a design space exploration algorithm, which demonstrates the potential of this approach, for automatically deriving pipelined designs from high-level sequential specifications. The configurability of FPGA hardware and the advent of multi-FPGA platforms leads to new decision procedures for applying existing transformations. In this research, we investigate how techniques, borrowed and adapted from existing parallelizing compiler technology, can be combined with commercial synthesis tools, to automatically derive realizable and efficient designs on multiple FPGA-based architectures. In particular, the contributions are as follows: • Communication Analysis and Pipelining. We define a set of compiler analyses and transformations required to automatically design the communication for application specific pipelines for FPGA-based architectures. We determine the best communication granularity, the corresponding communication placement points within the code, and the exact data that must be communicated between pipeline stages. • Partition and Custom Data Layout. Our compiler algorithm finds a coarse-grain computation and data partition, along with a custom data layout. To combat the large search space, we employ several heuristics. • Implementation and Evaluation. We implement our analyses and present experimental results for a set of image-processing kernels. With the growing number of available transistors on a single die, we anticipate the emergence of multiprocessor systems-on-a-chip and reconfigurable computing architectures with the ability to incorporate (through soft-cores) various coarse-grain computing elements such as microprocessor cores, and application specific engines (ASEs). Enabling pipelined execution, communication across computing cores, task level parallelism, and data distribution across banked memories will become increasingly important issues. Our analyses will allow the automated application mapping for these emerging infrastructures.
منابع مشابه
Search Space Properties for Mapping Coarse-Grain Pipelined FPGA Applications
This paper describes an automated approach to hardware design space exploration, through a collaboration between parallelizing compiler technology and high-level synthesis tools. In previous work, we described a compiler algorithm that optimizes individual loop nests, expressed in C, to derive an efficient FPGA implementation. In this paper, we describe a global optimization strategy that maps ...
متن کاملA design flow for speeding-up dsp applications in heterogeneous reconfigurable systems
In this paper, we propose a method for speeding-up Digital Signal Processing applications by partitioning them between the reconfigurable hardware blocks of different granularity and mapping critical parts of applications on coarse-grain reconfigurable hardware. The reconfigurable hardware blocks are embedded in a heterogeneous reconfigurable system architecture. The fine-grain part is implemen...
متن کاملCoarse-Grain Pipelining on Multiple FPGA Architectures
Reconfigurable systems, and in particular, FPGA-based custom computing machines, offer a unique opportunity to define application-specific architectures. These architectures offer performance advantages for application domains such as image processing, where the use of customized pipelines exploits the inherent coarse-grain parallelism. In this paper we describe a set of program analyses and an...
متن کاملDesign and Implementation of Digital Demodulator for Frequency Modulated CW Radar (RESEARCH NOTE)
Radar Signal Processing has been an interesting area of research for realization of programmable digital signal processor using VLSI design techniques. Digital Signal Processing (DSP) algorithms have been an integral design methodology for implementation of high speed application specific real-time systems especially for high resolution radar. CORDIC algorithm, in recent times, is turned out to...
متن کاملA Coarse-Grain Hierarchical Technique for 2-Dimensional FFT on Configurable Parallel Computers
FPGAs (Field-Programmable Gate Arrays) have been widely used as coprocessors to boost the performance of data-intensive applications [1][2]. However, there are several challenges to further boost FPGA performance: the communication overhead between the host workstation and the FPGAs can be substantial; large-scale applications cannot fit in a single FPGA because of its limited capacity; mapping...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004